The Effect of Annotation Scheme Decisions on Parsing Learner Data
نویسندگان
چکیده
We present a study on the dependency parsing of second language learner data, focusing less on the parsing techniques and more on the effect of the linguistic distinctions made in the data. In particular, we examine syntactic annotation that relies more on morphological form than on meaning. We see the effect of particular linguistic decisions by: 1) converting and transforming a training corpus with a similar annotation scheme, with transformations occurring either before or after parsing; 2) inputting different kinds of partof-speech (POS) information; and 3) analyzing the output. While we see a general favortism for parsing with more local dependency relations, this seems to be less the case for parsing the data of lower-level learners.
منابع مشابه
Phrase Structure Annotation and Parsing for Learner English
There has been almost no work on phrase structure annotation and parsing specially designed for learner English despite the fact that they are useful for representing the structural characteristics of learner English. To address this problem, in this paper, we first propose a phrase structure annotation scheme for learner English and annotate two different learner corpora using it. Second, we s...
متن کاملInter-annotator Agreement for Dependency Annotation of Learner Language
This paper reports on a study of interannotator agreement (IAA) for a dependency annotation scheme designed for learner English. Reliably-annotated learner corpora are a necessary step for the development of POS tagging and parsing of learner language. In our study, three annotators marked several layers of annotation over different levels of learner texts, and they were able to obtain generall...
متن کاملREALEC learner treebank: annotation principles and evaluation of automatic parsing
The paper presents a Universal Dependencies (UD) annotation scheme for a learner English corpus. The REALEC dataset consists of essays written in English by Russian-speaking university students in the course of general English. The original corpus is manually annotated for learners’ errors and gives information on the error span, error type, and the possible correction of the mistake provided b...
متن کاملThe effect of disfluencies and learner errors on the parsing of spoken learner language
NLP tools are typically trained on written data from native speakers. However, research into language acquisition and tools for language teaching & proficiency assessment would benefit from accurate processing of spoken data from second language learners. In this paper we discuss manual annotation schemes for various features of spoken language; we also evaluate the automatic tagging of one par...
متن کاملLinguistic Issues in Language Technology – LiLT
Parsing learner data poses a great challenge for standard tools, since non-canonical and unusual structures may lead to wrong interpretations on the part of the taggers and parsers. It is well known that providing a statistical parser with perfect part-of-speech (POS) tags is of great benefit for parsing accuracy, and that parsing results can decrease considerably when the parser has to predict...
متن کامل